Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available October 8, 2026
-
Self-reported biographical strings on social media profiles provide a powerful tool to study personal identity. We present Statewise, a dataset based on 50 million unique Twitter user profiles over a 12 year period identified to be in the United States. Users within this dataset can be accurately partitioned into 52 states/territories at each observation, allowing queries into state-specific language choices over time. We report on the major design decisions underlying Statewise, including the methodology behind the location detection system and measurements of user/state transitions across time. We demonstrate the power of Statewise to study the relative prevalences of different token groups, showing clear and consistent regional differences in language usage. We analyze emoji usage by comparing inclusion rates against external state-level statistics, finding that emoji inclusion shares a significant correlation with state unemployment and poverty rates. Finally, we use Gini coefficients as a measure of token usage inequality across all observed territories and demonstrate a clear stratification based on token content.more » « lessFree, publicly-accessible full text available June 7, 2026
-
Free, publicly-accessible full text available May 1, 2026
-
Self-reported biographical strings on social media profiles provide a powerful tool to study self-identity. We present HINENI, a dataset of 420 million Twitter user profiles collected over a 12 year period, partitioned into 32 distinct national cohorts, which we believe is the largest publicly available data resource for identity research. We report on the major design decisions underlying HINENI, including a new notion of sampling (k-persistence) which spans the divide between traditional cross-sectional and longitudinal approaches. We demonstrate the power of HINENI to study the relative survival rate (half-life) of different tokens, and the use of emoji analysis across national cohorts to study the effects of gender, national, and sports identities.more » « less
-
Occupational identity concerns the self-image of an individual’s affinities and socioeconomic class, and directs how a person should behave in certain ways. Understanding the establishment of occupational identity is important to studywork-related behaviors. However, large-scale quantitative studies of occupational identity are difficult to perform due to its indirect observable nature. But profile biographies on social media contain concise yet rich descriptions about self- identity. Analysis of these self-descriptions provides powerful insights concerning how people see themselves and how they change over time.In this paper, we present and analyze a longitudinal corpus recording the self-authored public biographies of 51.18 million Twitter users as they evolve over a six-year period from 2015-2021. In particular, we investigate the social approval (e.g., job prestige and salary) effects in how people self-disclose occupational identities, quantifying over-represented occupations as well as the occupational transitions w.r.t. job prestige over time. We show that self-reported jobs and job transitions are biased toward more prestigious occupations. We also present an intriguing case study about how self-reported jobs changed amid COVID-19 and the subsequent Great Resignation trend with the latest full year data in 2022. These results demonstrate that social media biographies are a rich source of data for quantitative social science studies, allowing unobtrusive observation of the intersectionsand transitions obtained in online self-presentation.more » « less
-
Mahmoud, Ali B. (Ed.)Billions of dollars are being invested into developing medical artificial intelligence (AI) systems and yet public opinion of AI in the medical field seems to be mixed. Although high expectations for the future of medical AI do exist in the American public, anxiety and uncertainty about what it can do and how it works is widespread. Continuing evaluation of public opinion on AI in healthcare is necessary to ensure alignment between patient attitudes and the technologies adopted. We conducted a representative-sample survey (total N = 203) to measure the trust of the American public towards medical AI. Primarily, we contrasted preferences for AI and human professionals to be medical decision-makers. Additionally, we measured expectations for the impact and use of medical AI in the future. We present four noteworthy results: (1) The general public strongly prefers human medical professionals make medical decisions, while at the same time believing they are more likely to make culturally biased decisions than AI. (2) The general public is more comfortable with a human reading their medical records than an AI, both now and “100 years from now.” (3) The general public is nearly evenly split between those who would trust their own doctor to use AI and those who would not. (4) Respondents expect AI will improve medical treatment but more so in the distant future than immediately.more » « less
-
Trust is predictive of civic cooperation and economic growth. Recently, the U.S. public has demonstrated increased partisan division and a surveyed decline in trust in institutions. There is a need to quantify individual and community levels of trust unobtrusively and at scale. Using observations of language across more than 16,000 Facebook users, along with their self-reported generalized trust score, we develop and evaluate a language-based assessment of generalized trust. We then apply the assessment to more than 1.6 billion geotagged tweets collected between 2009 and 2015 and derive estimates of trust across 2,041 U.S. counties. We find generalized trust was associated with more affiliative words (love, we, andfriends) and less angry words (hateandstupid) but only had a weak association with social words primarily driven by strong negative associations with general othering terms (“they” and “people”). At the county level, associations with the Centers for Disease Control and Prevention (CDC) and Gallup surveys suggest that people in high-trust counties were physically healthier and more satisfied with their community and their lives. Our study demonstrates that generalized trust levels can be estimated from language as a low-cost, unobtrusive method to monitor variations in trust in large populations.more » « less
-
Identity Activism is a new phenomenon afforded by the massive popularity of social media. It consists of the prominent display of a social movement symbol within a space reserved for description of the self. The 2022 Russian invasion of Ukraine provides a contemporary (yet unfortunate) opportunity to observe this phenomenon. Here, we introduce and explore this concept in the context of the recent Twitter trend of displaying the Ukraine flag emoji in bios and names to signal support of Ukraine. We explore several questions, including: how has the popularity of this trend changed over time, are users who display the flag more likely to be connected to others who do, and what types of users are and are not participating. We find that Ukraine flag emoji prevalence in both names and bios increased many-fold in late February 2022, with it becoming the 11th most prevalent emoji in bios and the 3rd most prevalent emoji in names during March. We also find evidence that users who display the flag in their bio or name are more likely to follow and be followed by others who also do so, as compared to users who do not. Finally, we observe that users who share politically left-leaning messages were most likely to display the emoji. Those who share account information from alternative social media sites and non-personal accounts appear least likely. These findings give us insight into how users participate in Identity Activism, what connections exist between participating users, and, in this particular case, what types of users participate.more » « less
An official website of the United States government
